Skip to content

Add CloudWatch SEARCH() expression support for dynamic metric alarms#143

Open
jaredbrook wants to merge 5 commits intomasterfrom
feature/dimension-wildcards
Open

Add CloudWatch SEARCH() expression support for dynamic metric alarms#143
jaredbrook wants to merge 5 commits intomasterfrom
feature/dimension-wildcards

Conversation

@jaredbrook
Copy link
Copy Markdown
Member

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for CloudWatch SEARCH()-based dynamic metric alarms so alarms can target changing resources (e.g., replaced Auto Scaling Groups) without requiring fixed dimensions.

Changes:

  • Extend alarm models to carry search_expression / search_aggregation, plus interpolation support for ${Resource::...} inside search expressions.
  • Emit CloudFormation alarms using Metrics (metric math) when search_expression is present, and relax validation to not require MetricName/Namespace in that case.
  • Add documentation and RSpec coverage for search-expression alarms; bump gem version/dependencies.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
spec/search_expression_spec.rb New specs covering model fields, template emission, interpolation, and compile-time validation behavior
lib/cfnguardian/models/alarm.rb Add search_expression / search_aggregation fields to alarms
lib/cfnguardian/resources/base.rb Interpolate ${Resource::...} variables inside search expressions
lib/cfnguardian/stacks/resources.rb Emit AWS::CloudWatch::Alarm using Metrics/expressions when search_expression is set
lib/cfnguardian/compile.rb Skip metric_name/namespace validation for search-expression alarms; validate empty SearchExpression
docs/search_expressions.md New documentation describing configuration and limitations
docs/variables.md Document variable support inside search expressions
docs/overview.md Add Search Expression Alarms to docs TOC
lib/cfnguardian/version.rb Bump version to 0.13.0
Gemfile.lock Lockfile updates for new version and dependency resolutions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,215 @@
require 'spec_helper'
require 'json'
require 'yaml'
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec uses FileUtils.mkdir_p/FileUtils.rm_f later but never require 'fileutils'. Without it, the spec will raise NameError: uninitialized constant FileUtils in a clean test environment. Add the missing require (or avoid FileUtils usage).

Suggested change
require 'yaml'
require 'yaml'
require 'fileutils'

Copilot uses AI. Check for mistakes.
Comment thread spec/search_expression_spec.rb Outdated
expect(search_alarms.length).to eq(1)
expect(search_alarms.first.search_expression).to include('SEARCH')
ensure
FileUtils.rm_f(fixture)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec creates spec/fixtures via FileUtils.mkdir_p(config_dir) but only removes the YAML file in the ensure block. This leaves the fixtures directory behind after the test run. Consider removing the directory as well (when empty) or using a temp directory (e.g., Dir.mktmpdir) to avoid polluting the working tree.

Suggested change
FileUtils.rm_f(fixture)
FileUtils.rm_f(fixture)
FileUtils.rmdir(config_dir) if Dir.exist?(config_dir) && Dir.children(config_dir).empty?

Copilot uses AI. Check for mistakes.
Comment on lines +127 to 136
# String interpolation for search expressions
if alarm.search_expression.is_a?(String)
alarm.search_expression = alarm.search_expression.gsub(/\${Resource::([A-Za-z]+)}/) do
resource_key = $1
if @resource.has_key?(resource_key)
logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'"
alarm.dimensions[k] = @resource[resource_key]
logger.debug "interpolating search_expression variable '#{resource_key}' with value '#{@resource[resource_key]}' for alarm #{alarm.name}"
@resource[resource_key]
else
"${Resource::#{resource_key}}"
end
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search expression interpolation only matches ${Resource::...} keys made of letters (/\${Resource::([A-Za-z]+)}/). This is more restrictive than the dimension-variable interpolation above and contradicts the docs that say “any key from the resource definition” (keys may contain digits/underscores). Consider broadening the regex (e.g., allow [A-Za-z0-9_]+), and prefer Regexp.last_match(1) over $1 for clarity.

Copilot uses AI. Check for mistakes.
Comment on lines +193 to +196
if resource.search_expression
if resource.search_expression.empty?
@errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has an empty SearchExpression."
end
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource.search_expression.empty? assumes search_expression is a String. If a user misconfigures YAML (e.g., SearchExpression: []), this will raise a NoMethodError during validation instead of producing a helpful validation message. Consider validating search_expression.is_a?(String) and treating blank/whitespace-only strings as invalid (e.g., strip.empty?) to keep validation robust.

Copilot uses AI. Check for mistakes.
Comment thread lib/cfnguardian/stacks/resources.rb Outdated
Comment on lines +69 to +95
def add_search_expression_alarm(alarm, actions)
search_expr = alarm.search_expression
aggregation = alarm.search_aggregation || 'MAX'

@template.declare do
CloudWatch_Alarm("#{alarm.resource_hash}#{alarm.group}#{alarm.name.gsub(/[^0-9a-zA-Z]/i, '')}#{alarm.type}"[0..255]) do
ActionsEnabled true
AlarmDescription "Guardian alarm #{alarm.name} for the resource #{alarm.resource_id} in alarm group #{alarm.group}"
AlarmName CfnGuardian::CloudWatch.get_alarm_name(alarm)
ComparisonOperator alarm.comparison_operator
EvaluationPeriods alarm.evaluation_periods
Threshold alarm.threshold
AlarmActions actions
OKActions actions unless alarm.ok_action_disabled
TreatMissingData alarm.treat_missing_data unless alarm.treat_missing_data.nil?
DatapointsToAlarm alarm.datapoints_to_alarm unless alarm.datapoints_to_alarm.nil?
Metrics [
{
Id: 'search_expression',
Expression: search_expr,
ReturnData: false
},
{
Id: 'aggregate',
Expression: "#{aggregation}(search_expression)",
ReturnData: true
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_aggregation is inserted directly into the metric-math expression ("#{aggregation}(search_expression)") without validation/normalization. A typo or lowercase value will generate an invalid CloudWatch expression and fail at deploy time. Consider upcasing the value and validating it against the supported set (MAX/MIN/AVG/SUM), surfacing a compile-time validation error when invalid.

Copilot uses AI. Check for mistakes.
Comment thread lib/cfnguardian/stacks/resources.rb Outdated
Comment on lines +44 to +99
@@ -57,6 +65,38 @@ def add_alarm(alarm)
end
end
end

def add_search_expression_alarm(alarm, actions)
search_expr = alarm.search_expression
aggregation = alarm.search_aggregation || 'MAX'

@template.declare do
CloudWatch_Alarm("#{alarm.resource_hash}#{alarm.group}#{alarm.name.gsub(/[^0-9a-zA-Z]/i, '')}#{alarm.type}"[0..255]) do
ActionsEnabled true
AlarmDescription "Guardian alarm #{alarm.name} for the resource #{alarm.resource_id} in alarm group #{alarm.group}"
AlarmName CfnGuardian::CloudWatch.get_alarm_name(alarm)
ComparisonOperator alarm.comparison_operator
EvaluationPeriods alarm.evaluation_periods
Threshold alarm.threshold
AlarmActions actions
OKActions actions unless alarm.ok_action_disabled
TreatMissingData alarm.treat_missing_data unless alarm.treat_missing_data.nil?
DatapointsToAlarm alarm.datapoints_to_alarm unless alarm.datapoints_to_alarm.nil?
Metrics [
{
Id: 'search_expression',
Expression: search_expr,
ReturnData: false
},
{
Id: 'aggregate',
Expression: "#{aggregation}(search_expression)",
ReturnData: true
}
]
end
end
end
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_search_expression_alarm duplicates most of the CloudWatch_Alarm property wiring from add_standard_alarm, which increases the chance of future drift (e.g., tags/unit/low-sample settings). Consider extracting the common alarm property assignment into a shared helper, and only branching for the mutually-exclusive metric configuration (Dimensions/MetricName/etc vs Metrics).

Copilot uses AI. Check for mistakes.
Comment thread docs/search_expressions.md Outdated
Comment on lines +11 to +18
Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits `MetricDataQueries` with:

1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name
2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation

## Configuration

Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with `MetricDataQueries`.
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc says Guardian “emits MetricDataQueries”, but the CloudFormation AWS::CloudWatch::Alarm property name is Metrics (a list of MetricDataQuery objects). Consider adjusting wording to avoid confusion for readers cross-referencing the CFN spec.

Suggested change
Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits `MetricDataQueries` with:
1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name
2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation
## Configuration
Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with `MetricDataQueries`.
Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits the CloudFormation `Metrics` property (a list of `MetricDataQuery` objects) with:
1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name
2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation
## Configuration
Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with the alarm `Metrics` property.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +184 to +199
context 'when search expression alarm has no metric_name or namespace' do
it 'does not raise validation errors' do
fixture = File.join(config_dir, 'search_expression_alarms.yaml')
FileUtils.mkdir_p(config_dir)
File.write(fixture, {
'Resources' => {
'AutoScalingGroup' => [{ 'Id' => 'my-app-AsgGroup-abc123' }]
},
'Templates' => {
'AutoScalingGroup' => {
'CPUUtilizationHighBase' => {
'SearchExpression' => "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"CPUUtilization\" my-app', 'Minimum', 60)",
'SearchAggregation' => 'MAX'
},
'StatusCheckFailed' => false
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is described as having “no metric_name or namespace”, but it overrides an existing AutoScalingGroup default alarm (CPUUtilizationHighBase) that already sets both fields, so it doesn’t actually exercise the validation change. Consider adding a new alarm (one that doesn’t exist in defaults) with only SearchExpression set, and assert that validation passes and the created alarm has metric_name (and/or namespace) still nil.

Copilot uses AI. Check for mistakes.
run: |
gem install rspec
rspec
gem install bundler -v 2.4.22
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow installs Bundler 2.4.22, but the lockfile was generated with Bundler 2.3.19 (BUNDLED WITH). This often produces warnings and may cause unintended lockfile churn. Consider aligning the workflow Bundler version with the lockfile, or update the lockfile’s BUNDLED WITH to match the workflow version.

Suggested change
gem install bundler -v 2.4.22
gem install bundler -v 2.3.19

Copilot uses AI. Check for mistakes.
Comment thread lib/cfnguardian/stacks/resources.rb Outdated
Comment on lines +85 to +89
Metrics [
{
Id: 'search_expression',
Expression: search_expr,
ReturnData: false
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the CloudFormation Metrics property (metric math alarms), ThresholdMetricId is required to tell CloudWatch which query ID to evaluate against Threshold. Without it, the generated alarm resource may fail template validation or evaluate an unexpected metric. Set ThresholdMetricId to the ID of the return-data query (likely aggregate).

Copilot uses AI. Check for mistakes.
logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'"
alarm.dimensions[k] = @resource[resource_key]
logger.debug "interpolating search_expression variable '#{resource_key}' with value '#{@resource[resource_key]}' for alarm #{alarm.name}"
@resource[resource_key]
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gsub replacement block must return a String. If a resource value is numeric/boolean (e.g., a Port), returning it directly will raise TypeError during interpolation. Convert interpolated values to strings (e.g., @resource[resource_key].to_s) before returning from the block.

Suggested change
@resource[resource_key]
@resource[resource_key].to_s

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +193 to +204
if resource.search_expression
if !resource.search_expression.is_a?(String) || resource.search_expression.strip.empty?
@errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has an invalid SearchExpression. Must be a non-empty string."
end
if resource.search_aggregation
valid_aggregations = %w(MAX MIN AVG SUM)
normalized = resource.search_aggregation.to_s.upcase
if valid_aggregations.include?(normalized)
resource.search_aggregation = normalized
else
@errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has invalid SearchAggregation '#{resource.search_aggregation}'. Must be one of: #{valid_aggregations.join(', ')}."
end
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New validation behavior was added for SearchExpression (non-empty string) and SearchAggregation (must be one of MAX/MIN/AVG/SUM and normalized to uppercase), but there are no specs asserting these error cases and normalization. Add RSpec coverage that (1) invalid/blank SearchExpression causes CfnGuardian::ValidationError, and (2) invalid SearchAggregation is rejected while valid lowercase values are normalized.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +26
| Property | Required | Default | Description |
|---|---|---|---|
| `SearchExpression` | Yes | - | A CloudWatch SEARCH() expression string. Supports `${Resource::...}` [variables](variables.md). |
| `SearchAggregation` | No | `MAX` | Aggregation function applied to the search results. Valid values: `MAX`, `MIN`, `AVG`, `SUM`. |

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown table under "Properties" is malformed (lines start with ||), which renders as an empty first column in most markdown parsers. Use standard table syntax with a single leading/trailing | per row.

Copilot uses AI. Check for mistakes.
Comment thread spec/search_expression_spec.rb Outdated
Comment on lines +4 to +6
require 'fileutils'
require 'tmpdir'
require 'term/ansicolor'
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileutils and term/ansicolor are required but not used anywhere in this spec. Removing unused requires reduces load time and avoids pulling extra dependencies into the test process unnecessarily.

Suggested change
require 'fileutils'
require 'tmpdir'
require 'term/ansicolor'
require 'tmpdir'

Copilot uses AI. Check for mistakes.
Comment thread lib/cfnguardian/stacks/resources.rb Outdated
def add_alarm(alarm)
actions = alarm.alarm_action.kind_of?(Array) ? alarm.alarm_action.map{|action| Ref(action)} : [Ref(alarm.alarm_action)]
actions.concat alarm.maintenance_groups.map {|mg| Ref(mg)} if alarm.maintenance_groups.any?
use_search = alarm.search_expression ? true : false
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_search is true for any non-nil value, including an empty string. Since '' is truthy in Ruby, this would emit a Metrics-based alarm with a blank Expression, producing an invalid CloudFormation template. Consider basing use_search on alarm.search_expression.is_a?(String) && !alarm.search_expression.strip.empty? (or raising early) to align with validation behavior.

Suggested change
use_search = alarm.search_expression ? true : false
use_search = alarm.search_expression.is_a?(String) && !alarm.search_expression.strip.empty?

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Expression: "#{aggregation}(search_expression)",
ReturnData: true
}
]
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the CloudWatch Alarm Metrics property (metric math), CloudFormation expects a ThresholdMetricId to indicate which MetricDataQuery is evaluated against Threshold/ComparisonOperator. This block builds queries search_expression and aggregate but never sets ThresholdMetricId, which can cause the alarm to fail creation or evaluate the wrong series. Set ThresholdMetricId to the query id that returns data (e.g. aggregate).

Suggested change
]
]
ThresholdMetricId 'aggregate'

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants